Invalid Date
The formal introduction of beta regression is attributed to Ferrari and Cribari-Neto in their 2004 paper, “Beta Regression for Modelling Rates and Proportions”.
Case Study: This study focuses on a crucial aspect of academic admissions: understanding the factors influencing the chance of admission into academic programs.
Our analysis revolves around key variables: GRE scores, TOEFL scores, letters of recommendation (LOR), undergraduate CGPA, and research experience
Link appropriateness (“deviance residuals vs. indices of observation”, at least for the logit link)
Models continuous random variables and assumes values are in (0, 1), such as rates, proportions.
Dependent variable is beta-distributed.
Missing and outliers should be addressed to allow for model fitting.
Beta regression model function:
g(μ_i )=x_i^T β=n_i where β=(β_1,...,β_k )⊤ is a k × 1 vector.
(x_(i1 ),....,x_ik)⊤ is the vector of k regressors (or independent variables or covariates) and n_i is a linear predictor.
Link function g(μ)=log(μ∕(1-μ)).
The beta density formula is below. $$
f(x;p,q) =\frac{ \Gamma(p + q)}{\Gamma(p)*\Gamma(q)}x^{p-1}(1-x)^{q-1}
$$ , $$ 0 < y < 1 $$ where $p,q$ \> 0 and $\Gamma (.)$ is the Gamma function
This dataset was built with the purpose of helping students in shortlisting universities with their profiles. The predicted output gives them a fair idea about their chances for a particular university.
This dataset includes various information like GRE score, TOEFL score, university rating, SOP (Statement of Purpose), LOR (Letter of Recommendation), CGPA, research and chance of admit.
| Parameter | Range | Description |
|---|---|---|
| gre_score | 290 - 340 (340 scale) |
Quantifies a candidate’s performance on the Graduate Record Examination, with a maximum score of 340 |
| toefl_score | 92 - 120 (120 scale) |
Measures English language proficiency, scored out of a total of 120 points |
| un iv ersity_rating | 1 to 5 with 5 being the highest rating | Rates universities on a scale from 1 to 5, indicating their overall quality and reputation. |
| sop | 1 to 5 with 5 being the highest rating | E valuates the strength and quality of a candidate’s SOP on a scale of 1 to 5 |
| lor | 1 to 5 with 5 being the highest rating | E valuates the strength and quality of a candidate’s SOP and LOR on a scale of 1 to 5 |
| cgpa | 6.8 - 9.92 (10.0 scale) |
Reflects a student’s academic performance in their undergraduate studies, scored on a 10-point scale |
| research | 0 or 1 | Indicates whether a candidate has research experience (1) or not (0). |
| chance_ of_admit | 0.34 - 0.97 (0 to 1 scale) |
Represents the likelihood of a student being admitted, expressed as a decimal between 0 and 1 |
Descriptive Statistics:
Outliers:
Multicollinearity:
Correlation Analysis:
Our chosen model is specified as: chance_of_admit ~ gre_score + toefl_score + lor + cgpa + research.”
The model fitting involved iteratively refining our estimates to maximize the likelihood of observing our data given the model.
The beta regression model identified significant predictors of the chance of admission: GRE Score, TOEFL Score, Letters of Recommendation, CGPA, and Research Experience.
CGPA emerged as the strongest predictor, suggesting that academic performance in undergraduate studies is highly indicative of admission chances.
Research Experience also has a notable positive influence, underscoring the value of scholarly work in the admission process.
The study reinforces the importance of a well-rounded application, beyond test scores, highlighting the integral role of holistic review.
The model assumes a linear relationship between predictors and the logit of the admission chances, which may not capture all nuances.
The analysis is based on a specific dataset, which may limit the generalizability of the findings to other contexts or institutions.
Future research could explore the inclusion of interaction terms, non-linear relationships.